- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[OpenMP][ASan] Enable ASan Instrumentation for AMDGPUOpenMPToolChain. #124754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Amit Kumar Pandey (ampandey-1995) ChangesEnable device code ASan instrumentation for openmp offload applications using option '-fsanitize=address'. Full diff: https://github.com/llvm/llvm-project/pull/124754.diff 2 Files Affected: 
 diff --git a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
index 3f0b3f2d86b3ed..9e9ee7ed26f501 100644
--- a/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
@@ -37,6 +37,16 @@ AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
   // Lookup binaries into the driver directory, this is used to
   // discover the 'amdgpu-arch' executable.
   getProgramPaths().push_back(getDriver().Dir);
+  // Diagnose unsupported sanitizer options only once.
+  if (!Args.hasFlag(options::OPT_fgpu_sanitize, options::OPT_fno_gpu_sanitize,
+                    true))
+    return;
+  for (auto *A : Args.filtered(options::OPT_fsanitize_EQ)) {
+    SanitizerMask K = parseSanitizerValue(A->getValue(), /*AllowGroups=*/false);
+    if (K != SanitizerKind::Address)
+      D.getDiags().Report(clang::diag::warn_drv_unsupported_option_for_target)
+          << A->getAsString(Args) << getTriple().str();
+  }
 }
 
 void AMDGPUOpenMPToolChain::addClangTargetOptions(
@@ -72,9 +82,11 @@ llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(
   const OptTable &Opts = getDriver().getOpts();
 
   if (DeviceOffloadKind == Action::OFK_OpenMP) {
-    for (Arg *A : Args)
-      if (!llvm::is_contained(*DAL, A))
+    for (Arg *A : Args) {
+      if (!shouldSkipSanitizeOption(*this, Args, BoundArch, A) &&
+          !llvm::is_contained(*DAL, A))
         DAL->append(A);
+    }
 
     if (!DAL->hasArg(options::OPT_march_EQ)) {
       StringRef Arch = BoundArch;
diff --git a/clang/test/Driver/amdgpu-openmp-sanitize-options.c b/clang/test/Driver/amdgpu-openmp-sanitize-options.c
new file mode 100644
index 00000000000000..03adeb8e6a7833
--- /dev/null
+++ b/clang/test/Driver/amdgpu-openmp-sanitize-options.c
@@ -0,0 +1,58 @@
+// REQUIRES: x86-registered-target, amdgpu-registered-target
+
+// Fail on invalid ROCm Path.
+// RUN:   not %clang -### -fopenmp --offload-arch=gfx908:xnack+ -fsanitize=address -fgpu-sanitize -nogpuinc --rocm-path=%S/Inputs/rocm-invalid  %s 2>&1 \
+// RUN:   | FileCheck --check-prefix=FAIL %s
+
+// Enable multiple sanitizer's apart from ASan with invalid rocm-path.
+// RUN:   not %clang -### -fopenmp --offload-arch=gfx908:xnack+ -fsanitize=address -fsanitize=leak -fgpu-sanitize --rocm-path=%S/Inputs/rocm-invalid -nogpuinc  %s 2>&1 \
+// RUN:   | FileCheck --check-prefixes=UNSUPPORTED,FAIL %s
+
+// Memory, Leak, UndefinedBehaviour and Thread Sanitizer are not supported.
+// RUN:   %clang -### -fopenmp --offload-arch=gfx908:xnack+ -fsanitize=address -fsanitize=leak -fgpu-sanitize --rocm-path=%S/Inputs/rocm -nogpuinc  %s 2>&1 \
+// RUN:   | FileCheck --check-prefix=UNSUPPORTED %s
+
+
+// ASan Enabled Test Cases
+// ASan enabled for amdgpu-arch [gfx908]
+// RUN:   %clang -### -fopenmp --offload-arch=gfx908 -fsanitize=address -fgpu-sanitize --rocm-path=%S/Inputs/rocm %s 2>&1 \
+// RUN:   | FileCheck -check-prefixes=NOXNACK,GPUSAN %s
+
+// ASan enabled for amdgpu-arch [gfx908:xnack-]
+// RUN:   %clang -### -fopenmp --offload-arch=gfx908:xnack- -fsanitize=address -fgpu-sanitize --rocm-path=%S/Inputs/rocm %s 2>&1 \
+// RUN:   | FileCheck -check-prefixes=XNACKNEG,GPUSAN %s
+
+// ASan enabled for amdgpu-arch [gfx908:xnack+]
+// RUN:   %clang -### -fopenmp --offload-arch=gfx908:xnack+ -fsanitize=address -fgpu-sanitize --rocm-path=%S/Inputs/rocm %s 2>&1 \
+// RUN:   | FileCheck -check-prefixes=GPUSAN %s
+
+// ASan Disabled Test Cases
+// ASan disabled for amdgpu-arch [gfx908]
+// RUN:   %clang -### -fopenmp --offload-arch=gfx908 -fsanitize=address -fno-gpu-sanitize --rocm-path=%S/Inputs/rocm %s 2>&1 \
+// RUN:   | FileCheck -check-prefixes=NOGPUSAN %s
+
+// ASan disabled for amdgpu-arch [gfx908:xnack-]
+// RUN:   %clang -### -fopenmp --offload-arch=gfx908:xnack- -fsanitize=address -fno-gpu-sanitize --rocm-path=%S/Inputs/rocm %s 2>&1 \
+// RUN:   | FileCheck -check-prefixes=NOGPUSAN %s
+
+// ASan disabled for amdgpu-arch [gfx908:xnack+]
+// RUN:   %clang -### -fopenmp --offload-arch=gfx908:xnack+ -fsanitize=address -fno-gpu-sanitize --rocm-path=%S/Inputs/rocm %s 2>&1 \
+// RUN:   | FileCheck -check-prefixes=NOGPUSAN %s
+
+// FAIL-DAG: error: cannot find ROCm device library for ABI version 5; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
+// UNSUPPORTED-DAG: warning: ignoring '-fsanitize=leak' option as it is not currently supported for target 'amdgcn-amd-amdhsa'
+
+// NOXNACK: warning: ignoring '-fsanitize=address' option for offload arch 'gfx908' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead
+// XNACKNEG: warning: ignoring '-fsanitize=address' option for offload arch 'gfx908:xnack-' as it is not currently supported there. Use it with an offload arch containing 'xnack+' instead
+
+// GPUSAN: {{"[^"]*clang[^"]*" "-cc1" "-triple" "x86_64-unknown-linux-gnu".* "-fopenmp".* "-fsanitize=address".* "-fopenmp-targets=amdgcn-amd-amdhsa".* "-x" "c".*}}
+// GPUSAN: {{"[^"]*clang[^"]*" "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu".* "-emit-llvm-bc".* "-target-cpu" "gfx908".* "-fopenmp".* "-fsanitize=address".* "-x" "c".*}}
+// GPUSAN: {{"[^"]*clang-offload-packager[^"]*" "-o".* "--image=file=.*.bc,triple=amdgcn-amd-amdhsa,arch=gfx908(:xnack\-|:xnack\+)?,kind=openmp(,feature=(\-xnack|\+xnack))?"}}
+// GPUSAN: {{"[^"]*clang[^"]*" "-cc1" "-triple" "x86_64-unknown-linux-gnu".* "-fopenmp".* "-fsanitize=address".* "-fopenmp-targets=amdgcn-amd-amdhsa".* "-x" "ir".*}}
+// GPUSAN: {{"[^"]*clang-linker-wrapper[^"]*" "--host-triple=x86_64-unknown-linux-gnu" "--linker-path=[^"]*ld.lld".* "--whole-archive" "[^"]*libclang_rt.asan_static.a".* "--whole-archive" "[^"]*libclang_rt.asan.a".*}}
+
+// NOGPUSAN: {{"[^"]*clang[^"]*" "-cc1" "-triple" "x86_64-unknown-linux-gnu".* "-fopenmp".* "-fsanitize=address".* "-fopenmp-targets=amdgcn-amd-amdhsa".* "-x" "c".*}}
+// NOGPUSAN: {{"[^"]*clang[^"]*" "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-unknown-linux-gnu".* "-emit-llvm-bc".* "-target-cpu" "gfx908".* "-fopenmp".* "-x" "c".*}}
+// NOGPUSAN: {{"[^"]*clang-offload-packager[^"]*" "-o".* "--image=file=.*.bc,triple=amdgcn-amd-amdhsa,arch=gfx908(:xnack\-|:xnack\+)?,kind=openmp(,feature=(\-xnack|\+xnack))?"}}
+// NOGPUSAN: {{"[^"]*clang[^"]*" "-cc1" "-triple" "x86_64-unknown-linux-gnu".* "-fopenmp".* "-fsanitize=address".* "-fopenmp-targets=amdgcn-amd-amdhsa".* "-x" "ir".*}}
+// NOGPUSAN: {{"[^"]*clang-linker-wrapper[^"]*" "--host-triple=x86_64-unknown-linux-gnu" "--linker-path=[^"]*ld.lld".* "--whole-archive" "[^"]*libclang_rt.asan_static.a".* "--whole-archive" "[^"]*libclang_rt.asan.a".*}}
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broad question, should we instead aim for having -fsanitize work for both sides? I've always been very much against all these *-gpu bloat flags when we could just do -Xarch_host -fsanitize or -Xarch_device -fsanitize. The word against that it usually whether or not it would force people to update some makefiles.
| 
 I'm not sure I follow the question exactly, but in my opinion it does not make sense to independently control instrumentation for host and device. We had to implement a means to control instrumentation on the device when development started and after we changed its default once development was complete we kept the option just in case. | 
| 
 I'm referring  to the fact that for a normal compile, if the user passes  | 
| 
 I agree, -fsanitize=address is supposed to enable instrumentation on both host and device...and that is exactly what happens now. If this patch is affecting the existing behavior then something is wrong. | 
| // discover the 'amdgpu-arch' executable. | ||
| getProgramPaths().push_back(getDriver().Dir); | ||
| // Diagnose unsupported sanitizer options only once. | ||
| if (!Args.hasFlag(options::OPT_fgpu_sanitize, options::OPT_fno_gpu_sanitize, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@b-sumner I'm basing it on this, but it seems this isn't a new flag, so it's probably some existing behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-fgpu-sanitize was added during development of sanitizer for AMDGPU. Now it is mainly for debugging purpose to disable sanitizer on device side. Since there are existing apps using it, we need to send a deprecation note in ROCm release before removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is the option to control GPU instrumentation and it is set by default via -fsanitze=address. Explicitly setting it allows one to disable GPU instrumentation for debugging, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We actually had a flag **-fgpu-sanitize**  from the beginning of ASan project which actually used by every toolchain(hip,openmp,opencl) to enable device code instrumentation.
Using **-fsanitize=address**  enables asan instrumentation for host code , now for device when **-fgpu-sanitize**[Default always on] can be used alongwith **-fsanitize=address** but since it value is by default on so using -fgpu-sanitize only(without -fsanitize=address) would be  a no-op effect for both host and device code.
We only use the negative -fno-gpu-sanitize to disable device code asan instrumentation leaving only the host code asan instrumented.
BTW,  other checks are also done in progression i.e in  shouldSkipSanitizeOption which checks for :xnack+  feature also in --offload-arch= option.
In short asan instrumentation for both host and device is done using -fsanitize=address  , there are inbuilt clang-driver checks for hip,openmp,opencl which if successful allows -fsanitize=address to pass forward to device toolchain cmd line job invocation otherwise it is skipped.
If any edge tests cases missing please let me know @b-sumner , @yxsamliu , @jhuber6 ?
43761a4    to
    e9c2b34      
    Compare
  
    | ping | 
| // For simplicity, we only allow -fsanitize=address | ||
| SanitizerMask K = parseSanitizerValue(A->getValue(), /*AllowGroups=*/false); | ||
| if (K != SanitizerKind::Address) | ||
| if (K != SanitizerKind::Address) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving diag to this function will cause repeated diagnostic message since the function is called for each argument.
you may want to refactor the diag handling to a common function but keep the call sites at their original locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Hi @yxsamliu & @jhuber6 , there is one complication with respect to OpenMP toolchain i.e AMDGPUOpenMPToolchain.
I am seeing diagnoseUnsupportedSanitizers is getting called twice for AMDGPUOpenMP toolchain.
In Driver.cpp  AMDGPUOpenMPToolchain objects are created twice.
1. At line number 988.
#0  clang::driver::toolchains::ROCMToolChain::diagnoseUnsupportedSanitizers (this=0x7e1ff5c21900, Args=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/ToolChains/AMDGPU.h:154
#1  0x000055555c2e0b53 in clang::driver::toolchains::AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain (this=0x7e1ff5c21900, D=..., Triple=..., HostTC=..., Args=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:41
#2  0x000055555c22489b in std::make_unique<clang::driver::toolchains::AMDGPUOpenMPToolChain, clang::driver::Driver&, llvm::Triple&, clang::driver::ToolChain const&, llvm::opt::InputArgList const&> (__parameter_pack#0=..., __parameter_pack#1=..., __parameter_pack#2=..., __parameter_pack#3=...) at /usr/include/c++/12/bits/unique_ptr.h:1065
#3  0x000055555c1fa1ca in clang::driver::Driver::CreateOffloadingDeviceToolChains (this=0x7fffffff8420, C=..., Inputs=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/Driver.cpp:988
#4  0x000055555c2006d6 in clang::driver::Driver::BuildCompilation (this=0x7fffffff8420, ArgList=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/Driver.cpp:1780
#5  0x0000555559685f47 in clang_main(int, char**, llvm::ToolContext const&) ()
#6  0x000055555969a83a in main ()
2. At line number 1058
#0  clang::driver::toolchains::ROCMToolChain::diagnoseUnsupportedSanitizers (this=0x7e1ff5c23100, Args=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/ToolChains/AMDGPU.h:154
#1  0x000055555c2e0b53 in clang::driver::toolchains::AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain (this=0x7e1ff5c23100, D=..., Triple=..., HostTC=..., Args=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:41
#2  0x000055555c22489b in std::make_unique<clang::driver::toolchains::AMDGPUOpenMPToolChain, clang::driver::Driver&, llvm::Triple&, clang::driver::ToolChain const&, llvm::opt::InputArgList const&> (__parameter_pack#0=..., __parameter_pack#1=..., __parameter_pack#2=..., __parameter_pack#3=...) at /usr/include/c++/12/bits/unique_ptr.h:1065
#3  0x000055555c1fad00 in clang::driver::Driver::CreateOffloadingDeviceToolChains (this=0x7fffffff8420, C=..., Inputs=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/Driver.cpp:1058
#4  0x000055555c2006d6 in clang::driver::Driver::BuildCompilation (this=0x7fffffff8420, ArgList=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/Driver.cpp:1780
#5  0x0000555559685f47 in clang_main(int, char**, llvm::ToolContext const&) ()
#6  0x000055555969a83a in main ()
With Hip there is only one HIPAMD toolchain.
#0  clang::driver::toolchains::ROCMToolChain::diagnoseUnsupportedSanitizers (this=0x7e1ff5c21900, Args=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/ToolChains/AMDGPU.h:154
#1  0x000055555c3e4727 in clang::driver::toolchains::HIPAMDToolChain::HIPAMDToolChain (this=0x7e1ff5c21900, D=..., Triple=..., HostTC=..., Args=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/ToolChains/HIPAMD.cpp:220
#2  0x000055555c23278a in std::make_unique<clang::driver::toolchains::HIPAMDToolChain, clang::driver::Driver const&, llvm::Triple const&, clang::driver::ToolChain const&, llvm::opt::ArgList const&> (__parameter_pack#0=..., __parameter_pack#1=..., __parameter_pack#2=..., __parameter_pack#3=...) at /usr/include/c++/12/bits/unique_ptr.h:1065
#3  0x000055555c21df4c in clang::driver::Driver::getOffloadingDeviceToolChain (this=0x7fffffff8400, Args=..., Target=..., HostTC=..., TargetDeviceOffloadKind=@0x7fffffff78f0: clang::driver::Action::OFK_HIP) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/Driver.cpp:6817
#4  0x000055555c1f9b47 in clang::driver::Driver::CreateOffloadingDeviceToolChains (this=0x7fffffff8400, C=..., Inputs=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/Driver.cpp:922
#5  0x000055555c2006d6 in clang::driver::Driver::BuildCompilation (this=0x7fffffff8400, ArgList=...) at /home/ampandey/trunk-toolchain/src-home/llvm-project/clang/lib/Driver/Driver.cpp:1780
#6  0x0000555559685f47 in clang_main(int, char**, llvm::ToolContext const&) ()
#7  0x000055555969a83a in main ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for (Arg *A : Args) { | ||
| if (!llvm::is_contained(*DAL, A)) | ||
| for (Arg *A : Args) | ||
| if (!shouldSkipSanitizeOption(*this, Args, BoundArch, A) && | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should probably go in TranslateOpenMPTargetArgs instead unless we don't expect this to be usable anywhere but AMDGPU OpenMP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I think the  scope of shouldSkipSanitizeOption is limited to AMDGPU Derived Toolchains only.
Enable device code ASan instrumentation for openmp offload applications using option '-fsanitize=address'.
e9c2b34    to
    70fdee5      
    Compare
  
    Refactor logic of processing unsupported sanitizers per language toolchain to a common 'diagnoseUnsupportedSanitizers' function.
| ✅ With the latest revision this PR passed the C/C++ code formatter. | 
70fdee5    to
    4a87761      
    Compare
  
    | This seems to break tests on my bot: http://45.33.8.238/linux/159455/step_6.txt Does the error make sense to you? | 
| Fixes #125857 | 
…llvm#124754) Enable device code ASan instrumentation for openmp offload applications using option '-fsanitize=address'.
Enable device code ASan instrumentation for openmp offload applications using option '-fsanitize=address'.